feat(elasticsearch): optional ingest_pipeline for bulk writes by GunaPalanivel · Pull Request #3252 · deepset-ai/haystack-core-integrations

GunaPalanivel · 2026-04-28T16:54:00Z

What

Adds an optional ingest_pipeline parameter to ElasticsearchDocumentStore. When set, write_documents and write_documents_async pass it as the Elasticsearch bulk API pipeline argument so ingest pipelines (including inference processors) can run at index time. Default behavior is unchanged (None).

Why

Fixes Index-time inference in ElasticsearchDocumentStore (optional / deferred) #2940

Users who index into Elasticsearch with a pre-defined ingest pipeline had no way to attach that pipeline from Haystack. The parent discussion in #699 defers broader retrieval work; this change is the narrow index-time hook only.

How

New ingest_pipeline on ElasticsearchDocumentStore.__init__: non-empty strings only (after strip); whitespace-only raises ValueError.
helpers.bulk / helpers.async_bulk receive pipeline=... only when the value is set, so deletes and existing callers are unaffected.
to_dict / from_dict include the field like sparse_vector_field (missing key deserializes to None).

Testing

Unit tests cover serialization (test_to_dict, from_dict variants), validation, and mocked bulk calls to assert pipeline is passed or omitted. Retriever to_dict / from_dict expectations were updated for nested document store serialization.

To reproduce locally:

cd integrations/elasticsearch
hatch run test:unit
hatch run test:types

For formatting on Windows, if hatch run fmt fails on path handling, use:

hatch -e default run ruff check --fix <paths>
hatch -e default run ruff format <paths>

Test output:

117 passed, 182 deselected (unit tests on current main)

Type check:

Success: no issues found in 10 source files

Trade-offs

No per-call pipeline override on write_documents (only store-level). Can be added later if needed.
Pipeline creation and mapping design stay in Elasticsearch; Haystack does not validate that the pipeline exists.

Pass Elasticsearch bulk pipeline when set so ingest pipelines (e.g. inference) run at index time. Serialize in to_dict; omit when unset. Extend retriever serialization tests. Fixes deepset-ai#2940 Made-with: Cursor

github-actions · 2026-04-28T16:59:15Z

Coverage report (elasticsearch)

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
integrations/elasticsearch/src/haystack_integrations/components/retrievers/elasticsearch
bm25_retriever.py
embedding_retriever.py
integrations/elasticsearch/src/haystack_integrations/document_stores/elasticsearch
document_store.py					373-383, 406-416, 480
filters.py
Project Total

_{This report was generated by python-coverage-comment-action}

Add filter edge-case tests (comparisons, ranges, in/not in) and retriever init validation tests. filters.py reaches 100% line coverage; combined package unit coverage ~63%. Made-with: Cursor

davidsbatista · 2026-04-30T14:39:26Z

@GunaPalanivel, you did not add any integration tests against the ElasticSearch cloud instance.

I will do it and can take it over from here.

GunaPalanivel · 2026-04-30T14:43:42Z

@GunaPalanivel, you did not add any integration tests against the ElasticSearch cloud instance.

I will do it and can take it over from here.

@davidsbatista - if you are fine I'll add integration tests against the ElasticSearch cloud instance.

davidsbatista · 2026-04-30T14:46:05Z

@GunaPalanivel, you did not add any integration tests against the ElasticSearch cloud instance.
I will do it and can take it over from here.

@davidsbatista - if you are fine I'll add integration tests against the ElasticSearch cloud instance.

No need, thanks, your contribution was already helpful!

…d an empty list

…id_retriever.py

feat(elasticsearch): optional ingest_pipeline for bulk index writes

85f8604

Pass Elasticsearch bulk pipeline when set so ingest pipelines (e.g. inference) run at index time. Serialize in to_dict; omit when unset. Extend retriever serialization tests. Fixes deepset-ai#2940 Made-with: Cursor

GunaPalanivel requested a review from a team as a code owner April 28, 2026 16:54

GunaPalanivel requested review from anakin87 and removed request for a team April 28, 2026 16:54

github-actions Bot added type:documentation Improvements or additions to documentation integration:elasticsearch and removed type:documentation Improvements or additions to documentation labels Apr 28, 2026

test(elasticsearch): raise unit coverage above 60%

1f0c81b

Add filter edge-case tests (comparisons, ranges, in/not in) and retriever init validation tests. filters.py reaches 100% line coverage; combined package unit coverage ~63%. Made-with: Cursor

github-actions Bot added the type:documentation Improvements or additions to documentation label Apr 28, 2026

anakin87 requested a review from davidsbatista April 29, 2026 09:51

davidsbatista added 2 commits April 29, 2026 16:10

Merge branch 'main' into feat/2940-elasticsearch-ingest-pipeline

d3870c5

Merge branch 'main' into feat/2940-elasticsearch-ingest-pipeline

a2d07e2

anakin87 removed their request for review April 30, 2026 10:53

adding integrations tests over Elastic Search Cloud

28045dc

davidsbatista added 7 commits April 30, 2026 16:51

fixing env vars for elastic search cloud

fe46bab

correcting tests

365e7fd

Adding deserislisation test considering ES retunring sparse embeddings

f187182

Merge branch 'main' into feat/2940-elasticsearch-ingest-pipeline

de85c37

fields_hit[sparse_field][0] would crash with IndexError if ES returne…

68b1a32

…d an empty list

Merge branch 'main' into feat/2940-elasticsearch-ingest-pipeline

948bb27

fixing conftest from merge + adding missing tests/test_inference_hybr…

c01fa01

…id_retriever.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(elasticsearch): optional ingest_pipeline for bulk writes#3252

feat(elasticsearch): optional ingest_pipeline for bulk writes#3252
GunaPalanivel wants to merge 12 commits intodeepset-ai:mainfrom
GunaPalanivel:feat/2940-elasticsearch-ingest-pipeline

GunaPalanivel commented Apr 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

davidsbatista commented Apr 30, 2026

Uh oh!

GunaPalanivel commented Apr 30, 2026

Uh oh!

davidsbatista commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

GunaPalanivel commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Testing

Trade-offs

Uh oh!

github-actions Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report (elasticsearch)

Uh oh!

davidsbatista commented Apr 30, 2026

Uh oh!

GunaPalanivel commented Apr 30, 2026

Uh oh!

davidsbatista commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GunaPalanivel commented Apr 28, 2026 •

edited

Loading

github-actions Bot commented Apr 28, 2026 •

edited

Loading